-
Notifications
You must be signed in to change notification settings - Fork 260
make 'opm serve' initialization asynchronous #977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make 'opm serve' initialization asynchronous #977
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joelanford The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
da0db50 to
0a337c5
Compare
Codecov Report
@@ Coverage Diff @@
## master #977 +/- ##
==========================================
+ Coverage 52.60% 55.27% +2.67%
==========================================
Files 104 130 +26
Lines 9422 10826 +1404
==========================================
+ Hits 4956 5984 +1028
- Misses 3536 3763 +227
- Partials 930 1079 +149
Continue to review full report at Codecov.
|
|
I think this is saying that However, we did intend for |
0a337c5 to
d101180
Compare
…t of grpc server Signed-off-by: Joe Lanford <[email protected]>
f860fd8 to
f658013
Compare
f658013 to
7db066c
Compare
Signed-off-by: Joe Lanford <[email protected]>
7db066c to
eb0d664
Compare
| .PHONY: unit | ||
| unit: | ||
| $(GO) test -coverprofile=coverage.out $(SPECIFIC_UNIT_TEST) $(TAGS) $(TEST_RACE) -count=1 ./pkg/... ./alpha/... | ||
| $(GO) test -coverprofile=coverage.out -coverpkg=./... $(SPECIFIC_UNIT_TEST) $(TAGS) $(TEST_RACE) -count=1 ./pkg/... ./alpha/... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change means we now consider cross-package coverage. (i.e. the tests in pkg/server/server_test.go can contribute to coverage in pkg/registry/query.go)
This very likely means there's an artificial increase in the repo-wide coverage, but I did my best to add specific unit tests related to this PR in pkg/registry/query_test.go to make sure the new code paths are relatively well-covered.
dinhxuanvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Just one question.
|
|
||
| func (q Querier) ListBundles(ctx context.Context) ([]*api.Bundle, error) { | ||
| func (q *Querier) ListBundles(ctx context.Context) ([]*api.Bundle, error) { | ||
| if err := q.Wait(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, gRPC client on OLM side doesn't have a deadline setting so the rRPC calls can wait here for a long time. I wonder if we should set a deadline now so that if it goes for too long, it will fail out with DEADLINE_EXCEED error. Or perhaps "waiting until ready regardless how long it is" is a desire behavior? I would love this to be tested in a real cluster to see if things work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait has three cases when it returns:
- The passed in context was canceled (not sure of all the ways that context is propagated from the GRPC server that calls this function, one way is if the server is shutdown). Perhaps this could be the avenue to set a timeout on a per request basis?
- There was an error loading the FBC.
- The FBC was successfully loaded.
So if Wait blocks, it means we're still churning through the initialization and haven't encountered any issues yet.
I'll take a look at the request deadline possibilities.
|
/hold There are some implications on OLM that we need to consider:
|
Signed-off-by: Joe Lanford <[email protected]>
|
@joelanford: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@joelanford is this issue on permanent hold or do you see this getting resolved as is? |
|
@jdockter I don't think it's permanently blocked. We just need to prioritize checking the implications of this change on clients of the GRPC server (catalog-operator and packageserver). |
|
Closing in favor of #1005 |
Signed-off-by: Joe Lanford [email protected]
Description of the change:
This PR modifies
opm serveso that loading FBC is handled asynchronously in the background so that the grpc server can start immediately, thus enabling the health probes to return successfully faster.All of the grpc endpoints that require the FBC to be loaded now block, waiting for the initialization to complete. Clients can now initiate a connection and the server will block the response until initialization completes. Clients that don't want to wait indefinitely can set timeouts on their requests.
Motivation for the change:
While we shipped a change in catalog-operator to make use of the startupProbe functionality, that fix will not solve problems caused by
opm serve's long startup times on clusters without the startupProbe.This fix solves that problem by ensuring the grpc server starts immediately (much like the sqlite-based grpc server).
Reviewer Checklist
/docs